Improved surrogate data for nonlinearity tests 



Thomas Schreiber, Andreas Schmitz 
Physics Department, University of Wuppertal, 
D-42097 Wuppertal, Germany 
(Phys. Rev. Lett. 77, 635 (1996)) 



0^ ■ 

a\ ■ 

OS 

CD 

^ : 
o • 



^ . 
o ■ 
o\ ■ 
o 

On , 

a^ ■ 
I 

O 

^ : 
o 



Current tests for nonlinearity compare a time series to 
the null hypothesis of a Gaussian linear stochastic process. 
For this restricted null assumption, random surrogates can 
be constructed which are constrained by the linear properties 
of the data. We propose a more general null hypothesis al- 
lowing for nonlinear rescalings of a Gaussian linear process. 
We show that such rescalings cannot be accounted for by a 
simple amplitude adjustment of the surrogates which leads 
to spurious detection of nonlinearity. An iterative algorithm 
is proposed to make appropriate surrogates which have the 
same autocorrelations as the data and the same probability 
distribution. 
PACS: 05.45. +b 

The paradigm of deterministic chaos has become a very 
attractive concept for the study of the irregular time evo- 
lution of experimental or natural phenomena. Nonlinear 
methods have indeed been successfully applied to labo- 
ratory data from many different systems However, 
soon after the first signatures of low dimensional chaos 
had been reported for field data it turned out that 
nonlinear algorithms can mistake linear correlations, in 
particular those of the power law type, for determinism. 
1^ This has lead on the one hand to more critical appli- 
cations of algorithms like the correlation dimension. ^] 
On the other hand, significance tests have been proposed 
which allow for the detection of nonlinearity even when 
for example a clear scaling region is lacking in the cor- 
relation integral. The idea is to test results against 
the null hypothesis of a specific class of linear random 
processes. 

One of the most popular of such tests is the method of 
"surrogate data" , |^] which can be used with any nonlin- 
ear statistic that characterizes a time series by a single 
number. The value of the nonlinear discriminating statis- 
tic is computed on the measured data and compared to 
its empirical distribution on a collection of Monte Carlo 
realizations of the null hypothesis. Usually, the null as- 
sumption we want to make is not a very specific one, 
like a certain particular autoregressive (AR) process. We 
would rather like to be able to test general assumptions, 
for example that the data is described by some Gaus- 
sian linear random process. Thus we will not try to find 
a specific faithful model of the data; we will rather de- 
sign the Monte Carlo realizations to have the same linear 
properties as the data. The authors of Q call this a "con- 
strained realization" approach. 

In particular, the null hypothesis of autocorrelated 
Gaussian linear noise can be tested with surrogates which 



are by construction Gaussian random numbers but have 
the same autocorrelations as the signal. Due to the 
Wiener-Khinchin theorem, this is the case if their power 
spectra coincide. One can multiply the discrete Fourier 
transform of the data by random phases and then per- 
form the inverse transform (phase randomized surro- 
gates). Equivalently, one can create Gaussian indepen- 
dent random numbers, take their Fourier transform, re- 
place those amplitudes with the amplitudes of the Fourier 
transform of the original data, and then invert the Fourier 
transform. This is similar to a filter in the frequency do- 
main. Here the "filter" is the quotient of the desired and 
the actual Fourier amplitudes. 

In practice, the above null hypothesis is not as inter- 
esting as one might like: Very few of the time series con- 
sidered for a nonlinear treatment pass even a simple test 
for Gaussianity. Therefore we want to consider a more 
general null hypothesis including the possibility that the 
data were measured by an instantaneous, invertible mea- 
surement function h which does not depend on time n. 
A time series {s„}, n = 1, . . . , iV is consistent with this 
null hypothesis if there exists an underlying Gaussian lin- 
ear stochastic signal {xn} such that s„ = h{xn) for all 
n. If the null hypothesis is true, typical realizations of a 
process which obeys the null are expected to share the 
same power spectrum and amplitude distribution. But 
even within the class defined by the null hypothesis, dif- 
ferent processes will result in different power spectra and 
distributions. It is now an essential requirement that the 
discrimiating statistics must not mistake these variations 
for deviations from the null hypothesis. The tedious way 
to achieve this is by constructing a "pivotal" statistics 
which is insensitive to these differences. The alterna- 
tive we will pursue here is the "constrained realizations" 
approach: the variations in spectrum and distribution 
within the class defined by the null hypothesis are sup- 
pressed by constraining the surrogates to have the same 
power spectrum as well as the same distribution of values 
as the data. 

In 1^, the amplitude adjusted Fourier transform 
(AAFT) algorithm is proposed for the testing of this null 
hypothesis. First, the data {sn} is rendered Gaussian 
by rank-ordering according to a set of Gaussian ran- 
dom numbers. The resulting series = g{sn) is Gaus- 
sian but follows the measured time evolution {s„}. Now 
make phase randomized surrogates for {s^}, call them 
{sjj}. Finally, invert the rescaling g by rank-ordering 
{s^} according to the distribution of the original data, 
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FIG. 1. Discrepancy of the power spectra of human breath 
rate data (solid hne) and 19 AAFT surrogates (dashed hues). 
Here the power spectra have been computed with a square 
window of length 64.. 

Sn =g{s'n)- 

The AAFT algorithm should be correct asymptotically 
in the limit as — > oo. ||] For finite N however, {s„} and 
{s„} have the same distributions of amplitudes by con- 
struction, but they do not usually have the same sample 
power spectra. One of the reasons is that the phase ran- 
domization procedure performed on {s^} preserves the 
Gaussian distribution only on average. The fluctuations 
of {s^} and {sj^} will differ in detail. The nonlinearity 
contained in the amplitude adjustment procedure (jj is 
not equal to g^^) will turn these into a bias in the em- 
pirical power spectrum. Such systematic errors can lead 
to false rejections of the null hypothesis if a statistic is 
used which is sensitive to autocorrelations. The second 
reason is that g isn't really the inverse of the nonlinear 
measurement function ft,, and instead of recovering {a;„} 
we will find some other Gaussian series. Even if {s„} 
were Gaussian, g would not be the identity. Again, the 
two rescalings will lead to an altered spectrum. 

In Fig. 1^ we see power spectral estimates of a clinical 
data set and of 19 AAFT surrogates. The data is taken 
from data set B of the Santa Fe Institute time series 
contest |9| . It consists of 4096 samples of the breath rate 
of a patient with sleep apnea. The sampling interval is 
0.5 seconds. The discrepancy of the spectra is significant. 
A bias towards a white spectrum is noted: power is taken 
away from the main peak to enhance the low and high 
frequencies. 

The purpose of this letter is to propose an alternative 
method of producing surrogate data sets which have the 
same power spectrum and distribution as a given data 
set. We do not expect that these two requirements can 
be exactly fulfilled at the same time for finite N ^ except 
for the trivial solution, a cyclic shift of the data set it- 
self. We will rather construct sequences which assume 
the same values (without replacement) as the data and 
which have spectra which are practically indistinguish- 
able from that of the data. We can require a specific 




1 10 100 1000 

iterations 

FIG. 2. Convergence of the iterative scheme to the correct 
power spectrum while the distribution is kept fixed. First 
order AR process with nonlinear measurement. The curves 
were obtained with N = 1024, 2048, . . . , 32768, counted from 
above. We also show the curve oc l/i. 

maximal discrepancy in the power spectrum and report 
a failure if this accuracy could not be reached. 

The algorithm consists of a simple iteration scheme. 
Store a sorted list of the values {s„} and the squared 
amplitudes of the Fourier transform of {s„}, S*^ — 
I TZZo s„e^2'^'="/^ p. Begin with a random shuffle (with- 
out replacement) {sn°''} of the data. [|lO| Now each iter- 
ation consists of two consecutive steps. First {s^*-*} is 
brought to the desired sample power spectrum. This is 
achieved by taking the Fourier transform of {sn^}, replac- 
ing the squared amplitudes {S'^'*''''} by {S\} and then 
transforming back. The phases of the complex Fourier 
components are kept. Thus the first step enforces the cor- 
rect spectrum but usually the distribution will be mod- 
ified. Therefore, as the second step, rank-order the re- 
sulting series in order to assume exactly the values taken 
by {sn\- Unfortunately, the spectrum of the resulting 
{sn^^''} will be modified again. Therefore the two steps 
have to be repeated several times. 

At each iteration stage we can check the remaining dis- 
crepancy of the spectrum and iterate until a given accu- 
racy is reached. For finite N we don't expect convergence 
in the strict sense. Eventually, the transformation to- 
wards the correct spectrum will result in a change which 
is too small to cause a reordering of the values. Thus 
after rescaling, the sequence is not changed. 

In Fig. |2| we show the convergence of the iteration 
scheme as a function of the iteration count i and the 
length of the time series N . The data here was a first 
order AR process Xn = 0.7a;„_i -I- measured through 
Sn — x'^i- The increments ry„ are independent Gaussian 
random numbers. For each N = 1024, 2048, . . . , 32768 
we create a time series and ten surrogates. In order to 
quantify the convergence, the spectrum was estimated 
by 5^ = I T,n=o s„e*2^'="/^|2 and smoothed over 21 fre- 
quency bins. Si — X]j=fe-io '5'fc/21- Note that for the 
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FIG. 3. For the same process as used in Fig. |^ we show the 
saturation value of the accuracy for the above values of A'^. 
The straight line is oc 



generation of surrogates no smoothing is performed. As 
the (relative) discrepancy of the spectrum at the i-th 

iteration we use Y^^^oi^k^ ~ I Y^k=o ^t- ^ot sur- 
prisingly, progress is fastest in the first iteration, where 
the random scramble is initially brought from its white 
spectrum to the desired one (the initial discrepancy of 
the scramble was 0.2 ±0.01 for all cases and is not shown 
in Fig. H). For i > 1, the discrepancy of the spectrum de- 
creases approximately like 1 /i until an TV dependent sat- 
uration is reached. The saturation value seems to scale 
like an inverse power of A'^ which depends on the pro- 
cess. For the data underlying Fig. || we find a 1/VN 
dependence, see Fig. ^. For comparison, the discrep- 
ancy for AAFT surrogates did not fall below 0.015 for 
all N. We have observed similar scaling behavior for a 
variety of other linear correlated processes. For data from 
a discretized Mackey-Glass equation we found exponen- 
tial convergence cx exp(— 0.4i) before a saturation value 
was reached which decreases approximately like 
Although we found rapid convergence in all examples we 
have studied so far, the rate seems to depend both on 
the distribution of the data and the nature of the corre- 
lations. The details of the behavior are not yet under- 
stood. 

In order to verify that false rejections are indeed 
avoided by this scheme we compared the number of false 
positives in a test for nonlinearity for the AAFT algo- 
rithm and the iterative scheme, the latter as a func- 
tion of the number of iterations. We performed tests on 
data sets of 2048 points generated by the instantaneously, 
monotonously distorted AR process s„ = a;„ y^jx^, a;„ — 
0.95a:„_i + rjn. The discriminating statistic was a nonlin- 
ear prediction error obtained with locally constant fits in 
two dimensional delay space. For each test, 19 surrogates 
were created and the null hypothesis was rejected at the 
95% level of significance if the prediction error for the 
data was smaller then those of the 19 surrogates. The 
number of false rejections was estimated by performing 
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FIG. 4. Percentage of false rejections as a function of the 
number of iterations performed. Horizontal line: nominal re- 
jection rate at the 95% level of significance. In this case, 7 
iterations are sufficient to render the test accurate. The usual 
AAFT algorithm yields 66% false rejections. 



300 independent tests. Instead of the expected 5% false 
positives we found 66±5% false rejections with the AAFT 
algorithm. Fig. ^ shows the percentage of false rejections 
as a function of the number of iterations of the scheme 
described in this letter. The correct rejection rate for the 
95% level of significance is reached after about 7 itera- 
tions. This example is particularly dramatic because of 
the strong correlations, although the nonlinear rcscaling 
is not very severe. 

Let us make some further remarks on the proposed al- 
gorithm. We decided to use an unwindowed power spec- 
tral estimate which puts quite a strong constraint on the 
surrogates (the spectrum fixes A^/2 parameters). Thus it 
cannot be excluded that the iterative scheme is able to 
converge only by also adjusting the phases of the Fourier 
transform in a nontrivial way. This might introduce spu- 
rious nonlinearity in the surrogates in which case we can 
find the confusing result that there is less nonlinearity in 
the data than in the surrogates. If the null hypothesis 
is wrong, we expect more nonlinearity in the data (bet- 
ter nonlinear predictability, smaller estimated dimension 
etc.). Therefore we can always use one-sided tests and 
thus avoid additional false rejections. However, spurious 
structure in the surrogates can diminish the power of the 
statistical test. Since an unwindowed power spectral es- 
timate shows strong fluctuations within each frequency 
bin, it seems unnecessary to require the surrogates to 
have exactly the same spectrum as the data, including 
the fluctuations. The variance of the spectral estimate 
can be reduced for example by windowing, but the fre- 
quency content of the windowing function introduces an 
additional bias. 

Let us finally remark that although the null hypothesis 
of a Gaussian linear process measured by a monotonous 
function is the most general we have a proper statistical 
test for, its rejection does not imply nonlinear dynamics. 
For instance, noninstantaneous measurement functions 
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(e.g., Sn = x^Xn-i) are not included and (correctly) lead 
to a rejection of the null hypothesis, although the under- 
lying dynamics may be linear. Another example is first 
differences of the distorted output from a Gaussian linear 
process. Q 

In conclusion, wc established an algorithm to provide 
surrogate data sets containing random numbers with a 
given sample power spectrum and a given distribution of 
values. The achievable accuracy depends on the nature 
of the data and in particular the length of the time series. 
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